Deep Learning for Mechanical Engineering

Homework 07

Due Monday, 11/06/2021, 4:00 PM


Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
  • For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.

  • Only .ipynb files will be graded for your code.

    • Ensure that your NAME and student ID are included in your .ipynb files. ex) IljeokKim_20202467_HW07.ipynb
  • Compress all the files into a single .zip file.

    • In the .zip file's name, include your NAME and student ID. ex) DogyeomPark_20202467_HW07.zip
    • Submit this .zip file on KLMS
  • Do not submit a printed version of your code, as it will not be graded.

Problem 1: Load the dataset¶

We will create a convolutional neural network to classify images of berries, birds, dogs, and flowers. To get started, we need to download the dataset. This dataset will be utilized for both Problem 2 and Problem 3.

(1) Load the provided dataset.

In [ ]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
train_image = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_train_image.npy')
train_label = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_train_label.npy')
test_image = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_test_image.npy')
test_label = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_test_label.npy')

(2) Visualize ten randomly selected images from the training dataset.

In [ ]:
index = np.random.choice(train_image.shape[0], 10, replace = False)

plt.figure(figsize = (15, 6))

loc = 1
for i in index:
    plt.subplot(2, 5, loc)
    plt.imshow(train_image[i])
    plt.xticks([])
    plt.yticks([])
    loc += 1

plt.show()

Problem 2: Transfer Learning¶

We will utilize the VGG16 architecture to train our dataset. As shown in the image below, the VGG16 architecture consists of 16 layer blocks with a substantial number of trainable parameters. Fortunately, deep learning libraries like TensorFlow, Keras, and PyTorch offer pre-trained models for ImageNet, sparing us from the need to design and train a model from the ground up.


(1) Create a VGG16 model using deep learning libraries, such as TensorFlow, Keras, or PyTorch.

In [ ]:
vgg16_model = tf.keras.applications.vgg16.VGG16()

vgg16_model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467096/553467096 [==============================] - 7s 0us/step
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138357544 (527.79 MB)
Trainable params: 138357544 (527.79 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

(2) Revise the original VGG16 architecture. As shown in the image below, we will make modifications exclusively to the fully connected layer section. Additionally, given that we are using pre-trained parameters, the parameters of the feature extraction portion must remain fixed.




In [ ]:
vgg16_model.trainable = False

vgg16_model.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138357544 (527.79 MB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 138357544 (527.79 MB)
_________________________________________________________________
In [ ]:
block5_pool_layer = vgg16_model.layers[- 5].output

conv2d = tf.keras.layers.Conv2D(filters = 1024,
                                kernel_size = (3, 3),
                                activation = 'relu',
                                padding = 'SAME')(block5_pool_layer)

global_average_pooling2d = tf.keras.layers.GlobalAveragePooling2D()(conv2d)

dense = tf.keras.layers.Dense(units = 4, activation = 'softmax')(global_average_pooling2d)

model = tf.keras.Model(inputs = vgg16_model.inputs, outputs = dense)

model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 conv2d (Conv2D)             (None, 7, 7, 1024)        4719616   
                                                                 
 global_average_pooling2d (  (None, 1024)              0         
 GlobalAveragePooling2D)                                         
                                                                 
 dense (Dense)               (None, 4)                 4100      
                                                                 
=================================================================
Total params: 19438404 (74.15 MB)
Trainable params: 4723716 (18.02 MB)
Non-trainable params: 14714688 (56.13 MB)
_________________________________________________________________

(3) Train the modified VGG16 model.

In [ ]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_image, train_label, batch_size = 128, epochs = 5)
Epoch 1/5
19/19 [==============================] - 47s 1s/step - loss: 1.7230 - accuracy: 0.5192
Epoch 2/5
19/19 [==============================] - 10s 515ms/step - loss: 0.4351 - accuracy: 0.8533
Epoch 3/5
19/19 [==============================] - 10s 530ms/step - loss: 0.3287 - accuracy: 0.8742
Epoch 4/5
19/19 [==============================] - 10s 542ms/step - loss: 0.2418 - accuracy: 0.9104
Epoch 5/5
19/19 [==============================] - 11s 551ms/step - loss: 0.2242 - accuracy: 0.9150
Out[ ]:
<keras.src.callbacks.History at 0x7d1270167550>

(4) Print your accuracy with the test dataset.

In [ ]:
test_loss, test_acc = model.evaluate(test_image, test_label, verbose = 0)

print('Accuracy: {:.2f} %'.format(test_acc*100))
Accuracy: 90.75 %

Problem 3: Class Activation Maps¶

(1) Visualize the Class Activation Mapping (CAM) results as presented in the provided figure.

In [ ]:
def classname(n):
    if n == 0:
        return 'berry'
    elif n == 1:
        return 'bird'
    elif n == 2:
        return 'dog'
    else:
        return 'flower'

conv_layer = model.get_layer(index = - 3)
fc_layer   = model.layers[- 1].get_weights()[0]

my_map = tf.matmul(conv_layer.output, fc_layer)
CAM = tf.keras.Model(inputs = model.inputs, outputs = my_map)

idx = np.random.choice(test_image.shape[0], 3, replace=False)

cam_x  = []
pred_y = []

for i in idx:
    pred = np.argmax(model.predict(test_image[[i]]), axis = 1)
    predCAM = CAM.predict(test_image[[i]])

    attention = predCAM[:, :, :, pred]
    attention = np.abs(np.reshape(attention, (7, 7)))

    resized_attention = cv2.resize(attention,
                                   (224, 224),
                                   interpolation = cv2.INTER_CUBIC)

    cam_x.append(resized_attention)
    pred_y.append(pred)

plt.figure(figsize = (6, 9))
for i in range(3):
    plt.subplot(3, 2, 2 * i + 1)
    plt.imshow(test_image[idx[i]])
    plt.title('True: {} / Pred: {}'.format(classname(test_label[idx[i]]), classname(pred_y[i])), fontsize = 15)
    plt.axis('off')

    plt.subplot(3, 2, 2 * i + 2)
    plt.imshow(test_image[idx[i]])
    plt.imshow(cam_x[i], 'jet', alpha = 0.5)
    plt.title('Class Activation Map', fontsize = 15)
    plt.axis('off')
plt.show()
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 132ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 23ms/step